36 research outputs found

    KnetMiner - An integrated data platform for gene mining and biological knowledge discovery

    Get PDF
    Hassani-Pak K. KnetMiner - An integrated data platform for gene mining and biological knowledge discovery. Bielefeld: Universität Bielefeld; 2017.Discovery of novel genes that control important phenotypes and diseases is one of the key challenges in biological sciences. Now, in the post-genomics era, scientists have access to a vast range of genomes, genotypes, phenotypes and ‘omics data which - when used systematically - can help to gain new insights and make faster discoveries. However, the volume and diversity of such un-integrated data is often seen as a burden that only those with specialist bioinformatics skills, but often only minimal specialist biological knowledge, can penetrate. Therefore, new tools are required to allow researchers to connect, explore and compare large-scale datasets to identify the genes and pathways that control important phenotypes and diseases in plants, animals and humans. KnetMiner, with a silent "K" and standing for Knowledge Network Miner, is a suite of open-source software tools for integrating and visualising large biological datasets. The software mines the myriad databases that describe an organism’s biology to present links between relevant pieces of information, such as genes, biological pathways, phenotypes and publications with the aim to provide leads for scientists who are investigating the molecular basis for a particular trait. The KnetMiner approach is based on 1) integration of heterogeneous, complex and interconnected biological information into a knowledge graph; 2) text-mining to enrich the knowledge graph with novel relations extracted from literature; 3) graph queries of varying depths to find paths between genes and evidence nodes; 4) evidence-based gene rank algorithm that combines graph and information theory; 5) fast search and interactive knowledge visualisation techniques. Overall, [KnetMiner](http://knetminer.rothamsted.ac.uk) is a publicly available resource that helps scientists trawl diverse biological databases for clues to design better crop varieties and understand diseases. The key strength of KnetMiner is to include the end user into the “interactive” knowledge discovery process with the goal of supporting human intelligence with machine intelligence

    KnetMiner:A comprehensive approach for supporting evidence-based gene discovery and complex trait analysis across species

    Get PDF
    Generating new ideas and scientific hypotheses is often the result of extensive literature and database reviews, overlaid with scientists’ own novel data and a creative process of making connections that were not made before. We have developed a comprehensive approach to guide this technically challenging data integration task and to make knowledge discovery and hypotheses generation easier for plant and crop researchers. KnetMiner can digest large volumes of scientific literature and biological research to find and visualise links between the genetic and biological properties of complex traits and diseases. Here we report the main design principles behind KnetMiner and provide use cases for mining public datasets to identify unknown links between traits such grain colour and pre-harvest sprouting in Triticum aestivum, as well as, an evidence-based approach to identify candidate genes under an Arabidopsis thaliana petal size QTL. We have developed KnetMiner knowledge graphs and applications for a range of species including plants, crops and pathogens. KnetMiner is the first open-source gene discovery platform that can leverage genome-scale knowledge graphs, generate evidence-based biological networks and be deployed for any species with a sequenced genome. KnetMiner is available at http://knetminer.org

    Secondary cell wall composition and candidate gene expression in developing willow (Salix purpurea) stems

    Get PDF
    The properties of the secondary cell wall (SCW) in willow largely determine the suitability of willow biomass feedstock for potential bioenergy and biofuel applications. SCW development has been little studied in willow and it is not known how willow compares with model species, particularly the closely related genus Populus. To address this and relate SCW synthesis to candidate genes in willow, a tractable bud culture-derived system was developed in Salix purpurea, and cell wall composition and RNA-Seq transcriptome were followed in stems during early development. A large increase in SCW deposition in the period 0–2 weeks after transfer to soil was characterised by a big increase in xylan content, but no change in the frequency of substitution of xylan with glucuronic acid, and increased abundance of putative transcripts for synthesis of SCW cellulose, xylan and lignin. Histochemical staining and immunolabeling revealed that increased deposition of lignin and xylan was associated with xylem, xylem fibre cells and phloem fibre cells. Transcripts orthologous to those encoding xylan synthase components IRX9 and IRX10 and xylan glucuronyl transferase GUX1 in Arabidopsis were co-expressed, and showed the same spatial pattern of expression revealed by in situ hybridisation at four developmental stages, with abundant expression in proto-xylem, xylem fibre and ray parenchyma cells and some expression in phloem fibre cells. The results show a close similarity with SCW development in Populus species, but also give novel information on the relationship between spatial and temporal variation in xylan-related transcripts and xylan composition

    A near-chromosome level genome assembly of the European hoverfly, Sphaerophoria rueppellii (Diptera: Syrphidae), provides comparative insights into insecticide resistance-related gene family evolution

    Get PDF
    Background Sphaerophoria rueppellii, a European species of hoverfly, is a highly effective beneficial predator of hemipteran crop pests including aphids, thrips and coleopteran/lepidopteran larvae in integrated pest management (IPM) programmes. It is also a key pollinator of a wide variety of important agricultural crops. No genomic information is currently available for S. rueppellii. Without genomic information for such beneficial predator species, we are unable to perform comparative analyses of insecticide target-sites and genes encoding metabolic enzymes potentially responsible for insecticide resistance, between crop pests and their predators. These metabolic mechanisms include several gene families - cytochrome P450 monooxygenases (P450s), ATP binding cassette transporters (ABCs), glutathione-S-transferases (GSTs), UDP-glycosyltransferases (UGTs) and carboxyl/choline esterases (CCEs). Methods and findings In this study, a high-quality near-chromosome level de novo genome assembly (as well as a mitochondrial genome assembly) for S. rueppellii has been generated using a hybrid approach with PacBio long-read and Illumina short-read data, followed by super scaffolding using Hi-C data. The final assembly achieved a scaffold N50 of 87Mb, a total genome size of 537.6Mb and a level of completeness of 96% using a set of 1,658 core insect genes present as full-length genes. The assembly was annotated with 14,249 protein-coding genes. Comparative analysis revealed gene expansions of CYP6Zx P450s, epsilon-class GSTs, dietary CCEs and multiple UGT families (UGT37/302/308/430/431). Conversely, ABCs, delta-class GSTs and non-CYP6Zx P450s showed limited expansion. Differences were seen in the distributions of resistance-associated gene families across subfamilies between S. rueppellii and some hemipteran crop pests. Additionally, S. rueppellii had larger numbers of detoxification genes than other pollinator species. Conclusion and significance This assembly is the first published genome for a predatory member of the Syrphidae family and will serve as a useful resource for further research into selectivity and potential tolerance of insecticides by beneficial predators. Furthermore, the expansion of some gene families often linked to insecticide resistance and selectivity may be an indicator of the capacity of this predator to detoxify IPM selective insecticides. These findings could be exploited by targeted insecticide screens and functional studies to increase effectiveness of IPM strategies, which aim to increase crop yields by sustainably and effectively controlling pests without impacting beneficial predator populations.Biotechnology and Biological Sciences Research Council (BBSRC): Bayer Crop Science and Syngenta AG

    A scaffold-level genome assembly of a minute pirate bug, Orius laevigatus (Hemiptera: Anthocoridae), and a comparative analysis of insecticide resistance-related gene families with hemipteran crop pests

    Get PDF
    Background: Orius laevigatus, a minute pirate bug, is a highly effective beneficial predator of crop pests including aphids, spider mites and thrips in integrated pest management (IPM) programmes. No genomic information is currently available for O. laevigatus, as is the case for the majority of beneficial predators which feed on crop pests. In contrast, genomic information for crop pests is far more readily available. The lack of publicly available genomes for beneficial predators to date has limited our ability to perform comparative analyses of genes encoding potential insecticide resistance mechanisms between crop pests and their predators. These mechanisms include several gene/protein families including cytochrome P450s (P450s), ATP binding cassette transporters (ABCs), glutathione S-transferases (GSTs), UDP-glucosyltransferases (UGTs) and carboxyl/cholinesterases (CCEs). Methods and findings: In this study, a high-quality scaffold level de novo genome assembly for O. laevigatus has been generated using a hybrid approach with PacBio long-read and Illumina short-read data. The final assembly achieved a scaffold N50 of 125,649 bp and a total genome size of 150.98 Mb. The genome assembly achieved a level of completeness of 93.6% using a set of 1658 core insect genes present as full-length genes. Genome annotation identified 15,102 protein-coding genes - 87% of which were assigned a putative function. Comparative analyses revealed gene expansions of sigma class GSTs and CYP3 P450s. Conversely the UGT gene family showed limited expansion. Differences were seen in the distributions of resistance-associated gene families at the subfamily level between O. laevigatus and some of its targeted crop pests. A target site mutation in ryanodine receptors (I4790M, PxRyR) which has strong links to diamide resistance in crop pests and had previously only been identified in lepidopteran species was found to also be present in hemipteran species, including O. laevigatus. Conclusion and significance: This assembly is the first published genome for the Anthocoridae family and will serve as a useful resource for further research into target-site selectivity issues and potential resistance mechanisms in beneficial predators. Furthermore, the expansion of gene families often linked to insecticide resistance may be an indicator of the capacity of this predator to detoxify selective insecticides. These findings could be exploited by targeted pesticide screens and functional studies to increase effectiveness of IPM strategies, which aim to increase crop yields by sustainably, environmentally-friendly and effectively control pests without impacting beneficial predator populations.Biotechnology and Biological Sciences Research Council (BBSRC

    Insecticide resistance mediated 1 by an exon skipping event

    Get PDF
    Many genes increase coding capacity by alternate exon usage. The gene encoding the insect nicotinic acetylcholine receptor (nAChR) a6 subunit, target of the bio-insecticide spinosad, is one example of this and expands protein diversity via alternative splicing of mutually exclusive exons. Here, we show that spinosad resistance in the tomato leaf miner, Tuta absoluta is associated with aberrant regulation of splicing of Taa6 resulting in a novel form of insecticide resistance mediated by exon skipping. Sequencing of the a6 subunit cDNA from spinosad selected and unselected strains of T. absoluta revealed all Taa6 transcripts of the selected strain were devoid of exon 3, with comparison of genomic DNA and mRNA revealing this is a result of exon skipping. Exon skipping cosegregated with spinosad resistance in survival bioassays, and functional characterization of this alteration using modied human nAChR a7, a model of insect a6, demonstrated that exon 3 is essential for receptor function and hence spinosad sensitivity. DNA and RNA sequencing analyses suggested that exon skipping did not result from genetic alterations in intronic or exonic cis-regulatory elements, but rather was associated with a single epigenetic modication downstream of exon 3a, and quantitative changes in the expression of trans-acting proteins that have known roles in the regulation of alternative splicing. Our results demonstrate that the intrinsic capacity of the a6 gene to generate transcript diversity via alternative splicing can be readily exploited during the evolution of resistance and identies exon skipping as a molecular alteration conferring insecticide resistance

    Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems.</p> <p>Results</p> <p>We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in <it>Arabidopsis thaliana</it>. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters.</p> <p>Conclusions</p> <p>Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.</p
    corecore